RGeneration RMarkdown Overview

Hello there!

My name is Mick O Donnell and I will take you through a tour of what RMarkdown can do for your work or research with R.

So… what is it?

RMarkdown is an extremely powerful, yet simple to implement micro-language which enables the mixing of analysis code with a narrative which can explain the code.

You’ve probably heard of html or xml… (Bonus points for knowing what the letters stand for!!!). Well, markdown is a well established simple text layout language, which has been built upon to create a language which is tailored for use with R.

It was first developed in 2012 by Yihui Xie - you can find a lot more detail on the language in his book which outlines how you can author documents such as books or dissertations in RMarkdown. Xie now works for RStudio and has developed an amazingly versatile and extensible system, which spans production of all forms of communication, from websites (blogdown) and slides to dissertations and journal articles.

The best way to learn it is to give yourself a project to do with it. I wrote an MSc Thesis in it, and while it was a challenge, it very much suited the research, as I wanted to achieve a reproducible analysis workflow - and I got to learn a new skill…!

My dissertation

My dissertation

It is fully integrated with RStudio, with options to create all forms of RMarkdown media, buttons to action the various commands needed to “knit” a markdown document, and allowing customisation of the type of output you want. While RMarkdown is a native web format, and using it to knit a html document means you can exploit all the power of web (javascript based) packages such as leaflet and plotly, it is also possible to output to pdf and even Microsoft Word formats.

One thing I would highly advise is to print out and laminate the handy cheatsheets available here. They’re invaluable and should be on the desk of any R user.

Analysis

For the purpose of exposing some of the features and convenience of RMarkdown for reproducible analysis, we will look at a recently released dataset - the Valuation Office released as open data information relating to commercial property for several counties around Ireland. You can learn more about this here.

We will import the data directly as a csv table from the api - the query string is obtained via the query engine on the Valutaion Office website. You can specify several different parameters such as the county, what fields you want to drawdown and what categories of business you are interested in. I have specified all the data for Dublin City Council.

if (!file.exists("data/valoff_dcc.csv")) {
  valoff_dcc <- read_csv("https://api.valoff.ie/api/Property/GetProperties?Fields=*&LocalAuthority=DUBLIN%20CITY%20COUNCIL&CategorySelected=OFFICE%2CFUEL%2FDEPOT%2CLEISURE%2CINDUSTRIAL%20USES%2CHEALTH%2CHOSPITALITY%2CMINERALS%2CMISCELLANEOUS%2CRETAIL%20(SHOPS)%2CUTILITY%2CRETAIL%20(WAREHOUSE)%2CNO%20CATEGORY%20SELECTED%2CCENTRAL%20VALUATION%20LIST%2CCHECK%20CATEGORY%2CNON-LIST%2CNON-LIST%20EXEMPT&Format=csv&Download=true")

  valoff_dcc %>% write_csv("data/valoff_dcc.csv")
} else valoff_dcc <- read_csv("data/valoff_dcc.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Property Number` = col_double(),
##   Valuation = col_double(),
##   `Car Park` = col_double(),
##   `X ITM` = col_double(),
##   `Y ITM` = col_double(),
##   Area = col_double(),
##   `Nav Per M2` = col_double(),
##   Nav = col_double()
## )
## See spec(...) for full column specifications.
valoff_dcc %<>% 
  mutate(prop_num = as.character(`Property Number`)) %>% 
  select(-`Property Number`) %>% 
  select(prop_num, everything())

This results in a table with 49786 rows and 20 columns. It might be helpful to inspect the data.

Inspect Data

str(valoff_dcc)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 49786 obs. of  20 variables:
##  $ Property Number : num  837643 837643 791577 791577 791577 ...
##  $ Publication Date: chr  "31/12/2013" "31/12/2013" "31/12/2013" "31/12/2013" ...
##  $ County          : chr  "DUBLIN" "DUBLIN" "DUBLIN" "DUBLIN" ...
##  $ Local Authority : chr  "DUBLIN CITY COUNCIL" "DUBLIN CITY COUNCIL" "DUBLIN CITY COUNCIL" "DUBLIN CITY COUNCIL" ...
##  $ Valuation       : num  0 0 89100 89100 89100 89100 89100 89100 89100 89100 ...
##  $ Category        : chr  "CENTRAL VALUATION LIST" "CENTRAL VALUATION LIST" "RETAIL (SHOPS)" "RETAIL (SHOPS)" ...
##  $ Uses            : chr  "NETWORK (TELECOM), TELEPHONE (EXCHANGE)" "NETWORK (TELECOM), TELEPHONE (EXCHANGE)" "CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE" "CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE" ...
##  $ Address 1       : chr  "12 DENZILLE LANE" "12 DENZILLE LANE" "47-48 Lower Baggot Street" "47-48 Lower Baggot Street" ...
##  $ Address 2       : chr  "DUBLIN 2" "DUBLIN 2" "Dublin 2" "Dublin 2" ...
##  $ Address 3       : chr  NA NA NA NA ...
##  $ Address 4       : chr  NA NA NA NA ...
##  $ Address 5       : chr  NA NA NA NA ...
##  $ Car Park        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ X ITM           : num  316704 316704 316733 316733 316733 ...
##  $ Y ITM           : num  233735 233735 233230 233230 233230 ...
##  $ Level           : chr  "1" "-1" "2" "2" ...
##  $ Floor Use       : chr  "OFFICE(S)" "CAR SPACES (NO. OF)" "TOILETS" "STORE" ...
##  $ Area            : num  220 2 24.7 64.5 33.1 ...
##  $ Nav Per M2      : num  0 0 100 100 120 132 175 350 700 1 ...
##  $ Nav             : num  0 0 2467 6452 3967 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Property Number` = col_double(),
##   ..   `Publication Date` = col_character(),
##   ..   County = col_character(),
##   ..   `Local Authority` = col_character(),
##   ..   Valuation = col_double(),
##   ..   Category = col_character(),
##   ..   Uses = col_character(),
##   ..   `Address 1` = col_character(),
##   ..   `Address 2` = col_character(),
##   ..   `Address 3` = col_character(),
##   ..   `Address 4` = col_character(),
##   ..   `Address 5` = col_character(),
##   ..   `Car Park` = col_double(),
##   ..   `X ITM` = col_double(),
##   ..   `Y ITM` = col_double(),
##   ..   Level = col_character(),
##   ..   `Floor Use` = col_character(),
##   ..   Area = col_double(),
##   ..   `Nav Per M2` = col_double(),
##   ..   Nav = col_double()
##   .. )
valoff_dcc %>% inspect_cat()
## # A tibble: 12 x 5
##    col_name         cnt common                  common_pcnt levels         
##    <chr>          <int> <chr>                         <dbl> <list>         
##  1 Address 1      16033 SHOP                          0.809 <tibble [16,03~
##  2 Address 2       6627 Dublin 2                      5.85  <tibble [6,627~
##  3 Address 3       1223 <NA>                         31.8   <tibble [1,223~
##  4 Address 4        285 <NA>                         78.5   <tibble [285 x~
##  5 Address 5         68 <NA>                         95.9   <tibble [68 x ~
##  6 Category          14 RETAIL (SHOPS)               46.7   <tibble [14 x ~
##  7 County             1 DUBLIN                      100     <tibble [1 x 3~
##  8 Floor Use        161 OFFICE(S)                    33.2   <tibble [161 x~
##  9 Level             49 0                            60.9   <tibble [49 x ~
## 10 Local Authori~     1 DUBLIN CITY COUNCIL         100     <tibble [1 x 3~
## 11 Publication D~     1 31/12/2013                  100     <tibble [1 x 3~
## 12 Uses             792 OFFICE (GEORGIAN / VIC~       8.62  <tibble [792 x~

So we have a column which serves as a Property Identifier, some columns with geographical info (Local Authority, County, Address, X ITM and Y ITM which are coordinate values) as well as information which describes the category of use (Category and Uses, Floor Use), the area (Area) and valuation attributed to the record (Nav per M2 and Nav).

Lets have a look at the first 20 records… I’m using the knitr::kable() function in conjuction with kableExtra() which allows customisation of table options. This is ideal for html output, but causes some issues with pdf output, if I recall correctly - some styling options are only available for html. A lot more detail on customisation of kableExtra() available here. kableExtra uses applies the twitter bootstrap theme to html tables, with oodles of customisation.

valoff_dcc %>% head(20) %>% kable() %>% kable_styling(bootstrap_options = "striped", full_width = T)
prop_num Publication Date County Local Authority Valuation Category Uses Address 1 Address 2 Address 3 Address 4 Address 5 Car Park X ITM Y ITM Level Floor Use Area Nav Per M2 Nav
837643 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 0 CENTRAL VALUATION LIST NETWORK (TELECOM), TELEPHONE (EXCHANGE) 12 DENZILLE LANE DUBLIN 2 NA NA NA 0 316703.8 233735.3 1 OFFICE(S) 220.00 0.0 0.00
837643 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 0 CENTRAL VALUATION LIST NETWORK (TELECOM), TELEPHONE (EXCHANGE) 12 DENZILLE LANE DUBLIN 2 NA NA NA 0 316703.8 233735.3 -1 CAR SPACES (NO. OF) 2.00 0.0 0.00
791577 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 89100 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 47-48 Lower Baggot Street Dublin 2 NA NA NA 0 316732.8 233229.5 2 TOILETS 24.67 100.0 2467.00
791577 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 89100 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 47-48 Lower Baggot Street Dublin 2 NA NA NA 0 316732.8 233229.5 2 STORE 64.52 100.0 6452.00
791577 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 89100 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 47-48 Lower Baggot Street Dublin 2 NA NA NA 0 316732.8 233229.5 1 STORE 33.06 120.0 3967.20
791577 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 89100 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 47-48 Lower Baggot Street Dublin 2 NA NA NA 0 316732.8 233229.5 1 RESTAURANT 101.93 132.0 13454.76
791577 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 89100 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 47-48 Lower Baggot Street Dublin 2 NA NA NA 0 316732.8 233229.5 0 RETAIL ZONE C 14.35 175.0 2511.25
791577 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 89100 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 47-48 Lower Baggot Street Dublin 2 NA NA NA 0 316732.8 233229.5 0 RETAIL ZONE B 36.62 350.0 12817.00
791577 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 89100 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 47-48 Lower Baggot Street Dublin 2 NA NA NA 0 316732.8 233229.5 0 RETAIL ZONE A 53.54 700.0 37478.00
791577 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 89100 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 47-48 Lower Baggot Street Dublin 2 NA NA NA 0 316732.8 233229.5 0 OFF LICENCE 10000.00 1.0 10000.00
2135648 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 71500 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE GROUND FLOOR + BASEMENT 64 O’CONNELL ST. UPPER DUBLIN 1 NA NA 0 315863.2 234710.4 -1 STORE 74.46 80.0 5956.80
2135648 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 71500 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE GROUND FLOOR + BASEMENT 64 O’CONNELL ST. UPPER DUBLIN 1 NA NA 0 315863.2 234710.4 0 RETAIL ZONE REMAINDER 36.34 100.0 3634.00
2135648 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 71500 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE GROUND FLOOR + BASEMENT 64 O’CONNELL ST. UPPER DUBLIN 1 NA NA 0 315863.2 234710.4 0 RETAIL ZONE C 36.95 200.0 7390.00
2135648 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 71500 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE GROUND FLOOR + BASEMENT 64 O’CONNELL ST. UPPER DUBLIN 1 NA NA 0 315863.2 234710.4 0 RETAIL ZONE B 46.03 400.0 18412.00
2135648 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 71500 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE GROUND FLOOR + BASEMENT 64 O’CONNELL ST. UPPER DUBLIN 1 NA NA 0 315863.2 234710.4 0 RETAIL ZONE A 33.53 800.0 26824.00
2135648 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 71500 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE GROUND FLOOR + BASEMENT 64 O’CONNELL ST. UPPER DUBLIN 1 NA NA 0 315863.2 234710.4 0 OFF LICENCE 1.00 9330.0 9330.00
716154 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 14970 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 82 Ballymun Road Dublin 9 NA NA NA 0 315436.3 237962.1 0 RETAIL ZONE A 36.79 300.0 11037.00
716154 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 14970 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 82 Ballymun Road Dublin 9 NA NA NA 0 315436.3 237962.1 0 RETAIL ZONE B 12.39 150.0 1858.50
716154 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 14970 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 82 Ballymun Road Dublin 9 NA NA NA 0 315436.3 237962.1 0 STORE 3.90 30.0 117.00
716154 31/12/2013 DUBLIN DUBLIN CITY COUNCIL 14970 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE 82 Ballymun Road Dublin 9 NA NA NA 0 315436.3 237962.1 0 OFF LICENCE 1.00 1963.5 1963.50

So, as we can see, we have multiple properties per row - can anyone guess what each row might represent?

valoff_dcc %>% 
  select(prop_num, Valuation, Category, Uses) %>% 
  distinct(prop_num, .keep_all = TRUE) -> rates_properties # 20306

rates_properties %>% 
  head(50) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
prop_num Valuation Category Uses
837643 0 CENTRAL VALUATION LIST NETWORK (TELECOM), TELEPHONE (EXCHANGE)
791577 89100 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE
2135648 71500 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE
716154 14970 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE
841510 51300 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE
2119783 18390 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE
1484301 78600 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE
853072 34200 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE
2193101 91200 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE
853528 89500 RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE
766233 29700 RETAIL (SHOPS) ETHNIC FOOD SHOP, OFF-LICENCE
766621 11020 RETAIL (SHOPS) ETHNIC FOOD SHOP, OFF-LICENCE
852714 450000 HOSPITALITY PUB, OFF-LICENCE
766619 40000 HOSPITALITY PUB, OFF-LICENCE
818727 110000 HOSPITALITY PUB, OFF-LICENCE
1031290 44200 HOSPITALITY PUB, OFF-LICENCE
714197 70200 RETAIL (SHOPS) SUPERMARKET 1 [200-500 SQ. M.], OFF-LICENCE
688917 259000 RETAIL (SHOPS) SUPERMARKET 2 [500-2500 SQ. M.], OFF-LICENCE
808950 46400 RETAIL (SHOPS) SHOP, OFF-LICENCE
814595 50900 RETAIL (SHOPS) CLOTHES SHOP, CHARITY SHOP
814592 34500 RETAIL (SHOPS) CLOTHES SHOP, CHARITY SHOP
1030481 4960 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
666082 3770 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
612387 6290 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
600833 3370 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
599648 5460 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
695162 6580 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
714316 6240 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
731816 3790 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
711012 5980 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
819039 4660 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
715223 2750 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
715995 3960 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
719403 4060 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
666145 4230 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
787968 5720 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
5017136 12320 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
2147280 8060 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
786969 6110 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
786967 8450 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
715138 6240 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
5009508 5780 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
785379 8290 LEISURE SPORTS & LEISURE CENTRE, HAIRDRESSING SALON
715211 2720 OFFICE OFFICE (OWN DOOR), HAIRDRESSING SALON
856372 43900 RETAIL (SHOPS) ETHNIC FOOD SHOP, HAIRDRESSING SALON
813050 4360 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
2166706 6510 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
719415 5620 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
594955 3260 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON
666083 3670 OFFICE OFFICE (OVER THE SHOP), HAIRDRESSING SALON

There are 20306 different properties listed for Dublin City Council. Let’s see how the categorical variables breakdown…

rates_properties %>% 
  summarise(average_RV = mean(Valuation),
            total_RV = sum(Valuation),
            number_of_properties = n()) %>% 
  kable()%>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
average_RV total_RV number_of_properties
63713.82 1293772817 20306
rates_properties %>% 
  count(Uses, sort = TRUE) %>% 
  head(50) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
Uses n
OFFICE (GEORGIAN / VICTORIAN), - 1683
OFFICE (OVER THE SHOP), - 1356
OFFICE (3RD GENERATION), - 1200
WAREHOUSE, - 1061
WORKSHOP, - 804
SHOP, - 712
OFFICE (OWN DOOR), - 648
RESTAURANT, - 590
PUB, - 465
STORE, - 448
HAIRDRESSING SALON, - 414
CLOTHES SHOP, - 355
OFFICE (2ND GENERATION), - 351
TAKE AWAY, - 298
OFFICE (BUSINESS PARK), - 251
OFFICE (HOUSE), - 231
CAFE, - 228
PHARMACY, - 228
OFFICE (HOUSE), OFFICE (GEORGIAN / VICTORIAN) 202
SHOP (OFFICES), - 198
OFFICE, - 193
BETTING SHOP, - 178
CONVENIENCE STORE [<200 SQ. M.], - 166
WAREHOUSE, OFFICE 160
NEWSAGENT, - 157
OFFICE (3RD GENERATION), 150
CAR PARK (OFFICE), - 147
RIGHT OF TRADING, - 146
ADVERTISING STATION, SHEET 48 141
VACANT, - 136
OFFICE (1ST GENERATION), - 135
OFFICE (MEWS), - 123
BEAUTY SALON / MASSAGE, - 119
DRY CLEANERS / LAUNDERETTE, - 111
BUTCHER, - 103
CHECK USE, 102
OFFICE (INDUSTRIAL), - 102
CAR PARK (SURFACE), - 101
OFFICE (GEORGIAN / VICTORIAN), 99
JEWELLERS, - 98
HOTEL, - 96
HARDWARE / DIY, - 89
BANK, - 88
OFFICE (HOUSE), SURGERY 88
OFFICE (OFFICE PARK), - 81
FACTORY, - 78
AUCTIONEER, - 74
COFFEE SHOP, - 74
KIOSK, - 73
SHOE SHOP, - 73
rates_properties %>% 
  count(Category, sort = TRUE) %>%  
  head(50) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
Category n
OFFICE 7660
RETAIL (SHOPS) 7184
INDUSTRIAL USES 3332
MISCELLANEOUS 822
HOSPITALITY 672
LEISURE 214
CHECK CATEGORY 105
RETAIL (WAREHOUSE) 93
HEALTH 83
FUEL/DEPOT 69
NO CATEGORY SELECTED 44
CENTRAL VALUATION LIST 13
UTILITY 12
NON-LIST 3
rates_properties %>%
  count(Category, Uses, sort = TRUE) -> category_use_table

category_use_table %>%  
  head(50) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
Category Uses n
OFFICE OFFICE (GEORGIAN / VICTORIAN), - 1683
OFFICE OFFICE (OVER THE SHOP), - 1356
OFFICE OFFICE (3RD GENERATION), - 1197
INDUSTRIAL USES WAREHOUSE, - 1061
INDUSTRIAL USES WORKSHOP, - 804
RETAIL (SHOPS) SHOP, - 712
OFFICE OFFICE (OWN DOOR), - 648
RETAIL (SHOPS) RESTAURANT, - 590
HOSPITALITY PUB, - 465
RETAIL (SHOPS) HAIRDRESSING SALON, - 414
INDUSTRIAL USES STORE, - 364
RETAIL (SHOPS) CLOTHES SHOP, - 355
OFFICE OFFICE (2ND GENERATION), - 351
RETAIL (SHOPS) TAKE AWAY, - 298
OFFICE OFFICE (BUSINESS PARK), - 251
OFFICE OFFICE (HOUSE), - 231
RETAIL (SHOPS) CAFE, - 228
RETAIL (SHOPS) PHARMACY, - 228
OFFICE OFFICE (HOUSE), OFFICE (GEORGIAN / VICTORIAN) 202
RETAIL (SHOPS) SHOP (OFFICES), - 198
RETAIL (SHOPS) BETTING SHOP, - 178
RETAIL (SHOPS) CONVENIENCE STORE [<200 SQ. M.], - 166
INDUSTRIAL USES WAREHOUSE, OFFICE 160
RETAIL (SHOPS) NEWSAGENT, - 157
OFFICE OFFICE (3RD GENERATION), 150
MISCELLANEOUS CAR PARK (OFFICE), - 147
RETAIL (SHOPS) RIGHT OF TRADING, - 146
MISCELLANEOUS ADVERTISING STATION, SHEET 48 141
OFFICE OFFICE (1ST GENERATION), - 135
INDUSTRIAL USES OFFICE, - 129
RETAIL (SHOPS) VACANT, - 129
OFFICE OFFICE (MEWS), - 123
RETAIL (SHOPS) BEAUTY SALON / MASSAGE, - 119
RETAIL (SHOPS) DRY CLEANERS / LAUNDERETTE, - 111
RETAIL (SHOPS) BUTCHER, - 103
CHECK CATEGORY CHECK USE, 102
OFFICE OFFICE (INDUSTRIAL), - 102
MISCELLANEOUS CAR PARK (SURFACE), - 101
OFFICE OFFICE (GEORGIAN / VICTORIAN), 99
RETAIL (SHOPS) JEWELLERS, - 98
HOSPITALITY HOTEL, - 96
RETAIL (SHOPS) HARDWARE / DIY, - 89
OFFICE OFFICE (HOUSE), SURGERY 88
RETAIL (SHOPS) BANK, - 88
OFFICE OFFICE (OFFICE PARK), - 81
INDUSTRIAL USES FACTORY, - 78
RETAIL (SHOPS) STORE, - 75
RETAIL (SHOPS) AUCTIONEER, - 74
RETAIL (SHOPS) COFFEE SHOP, - 74
RETAIL (SHOPS) KIOSK, - 73

As can be seen, the principal category is “Office”, the principal use is “OFFICE (GEORGIAN / VICTORIAN)” which might be expected for the Dublin City Council area.

Valuation by Category

Let’s investigate whether the valuation of a property might be related to the category of use of the property. In order to do this we will have to control for the overall area of the property, so we will need to add the area accross all rows for the property. Before we do this, there is a bit of data cleaning needed…
* Off-licenses are treated differently - filter out * Car-spaces are treated differently - filter out * there are some negative values - treat these as positive values * remove NA floor values * anything with an area values of 1 / -1 seems to be treated differently - filter out

note that I use the %<>% assignment pipe operator in the next section - more on this at R for Data Science

valoff_dcc %>% 
  filter(!(Area == 1.0 | Area == -1.0 | Area == 0)) %>% # remove illogical Area values
  filter(!`Floor Use` %in% c("CAR SPACES (NO. OF)")) %>% # remove car space rows
  filter(!is.na(`Floor Use`)) %>% # remove NA values in Floor Space
  mutate(Area = Mod(Area)) %>% 
  select(prop_num, Valuation, Area, Category, x = `X ITM`, y = `Y ITM`) -> munged_valoff_areas # convert negative values of Area to positive

# next we sum the area values accross the property with a group process...
munged_valoff_areas %>% 
  group_by(prop_num) %>% 
  summarise(mean_val = mean(Valuation), 
            total_area = sum(Area),
            Category = first(Category),
            x = mean(x), 
            y = mean(y),
            num_areas = n()) -> val_by_cat_area

# add a column to represent the valuation per area
val_by_cat_area %<>% # note use of assignment pipe
  mutate(val_per_area = mean_val / total_area)

Plotting Valuation vs Area

The next step is to plot the data. First we’ll do a bog standard ggplot, with a scatter of valuation versus total area, coloured by Category. We need to use log scales for both area and valuation..

val_by_cat_area %>% 
  filter(mean_val > 0) %>% # remove the few 0 values in mean_val
  ggplot(aes(total_area, mean_val)) +
  geom_point(aes(color = Category)) +
  scale_x_log10(labels = number_format()) + # number format from scales is for nive labels
  scale_y_log10(labels = number_format()) +
  labs(y = "Valuation",
       x = "Total Area")

The beauty of RMarkdown is that you can leverage the full range of interactive html widgets that R supports. One of these is plotly, which has a cool function called ggplotly which allows you to pass in a ggplot object and generate an interactive plot for you. Let’s have a look…

# lets pull back some of the data we had before, such as the address..
val_by_cat_area %>% 
  left_join(valoff_dcc %>% select(prop_num, c("Address 1", "Address 2", "Address 3"))) -> val_by_cat_area_w_address
## Joining, by = "prop_num"
# put the above dataframe into the ggplot object as above
val_by_cat_area_w_address %>% 
  mutate(add1 = `Address 1`,
         add2 = `Address 2`,
         add3 = `Address 3`) %>% 
  filter(mean_val > 0) %>% # remove the few 0 values in mean_val
  sample_frac(0.2) %>% # filter for 20% of the data to reduce load on html parser
  ggplot(aes(total_area, mean_val)) +
  geom_point(aes(color = Category, Address1 = add1, Address2 = add2, Address3 = add3)) +
  scale_x_log10(labels = number_format()) + # number format from scales is for nive labels
  scale_y_log10(labels = number_format()) +
  labs(y = "Valuation",
       x = "Total Area") -> val_area_scatter
## Warning: Ignoring unknown aesthetics: Address1, Address2, Address3
ggplotly(val_area_scatter,
         tooltip = c("Address1", "Address2", "Address3"))

As you can see, the ability to incorporate an interactive chart in a document is a very useful feature.

Mapping the Data

Lets look at mapping the information.